AITopics | exploding and vanishing gradient

Which Neural Net Architectures Give Rise to Exploding and Vanishing Gradients?

Neural Information Processing SystemsNov-20-2025, 21:48:39 GMT

We give a rigorous analysis of the statistical behavior of gradients in a randomly initialized fully connected network N with ReLU activations. Our results show that the empirical variance of the squares of the entries in the input-output Jacobian of N is exponential in a simple architecture-dependent constant beta, given by the sum of the reciprocals of the hidden layer widths. When beta is large, the gradients computed by N at initialization vary wildly. Our approach complements the mean field theory analysis of random networks. From this point of view, we rigorously compute finite width corrections to the statistics of gradients at the edge of chaos.

exploding and vanishing gradient, name change, proceedings, (1 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.43)

Add feedback

Which Neural Net Architectures Give Rise to Exploding and Vanishing Gradients?

Boris Hanin

Neural Information Processing SystemsNov-20-2025, 14:33:09 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, evgp, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Brazos County > College Station (0.14)
North America > Canada > Quebec > Montreal (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reviews: Which Neural Net Architectures Give Rise to Exploding and Vanishing Gradients?

Neural Information Processing SystemsOct-7-2024, 05:27:46 GMT

The paper studies the gradient vanishing/exploding problem (EVGP) theoretically in deep fully connected ReLU networks. As a substitute for ensuring if gradient vanishing/exploding has been avoided, the paper proposes two criteria: annealed EVGP and quenched EVGP. It is finally shown that both these criteria are met if the sum of reciprocal of layer widths of the network is a small number (thus the width of all layers should ideally be large). To confirm this empirically, the paper uses an experiment from a concurrent work. Comments: To motivate formally studying EVGP in deep networks, the authors refer to papers which suggest looking at the distribution of singular values of the input-output Jacobian.

criteria, exploding and vanishing gradient, singular value, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.51)

Add feedback

Which Neural Net Architectures Give Rise to Exploding and Vanishing Gradients?

Hanin, Boris

Neural Information Processing SystemsFeb-14-2020, 05:57:40 GMT

We give a rigorous analysis of the statistical behavior of gradients in a randomly initialized fully connected network N with ReLU activations. Our results show that the empirical variance of the squares of the entries in the input-output Jacobian of N is exponential in a simple architecture-dependent constant beta, given by the sum of the reciprocals of the hidden layer widths. When beta is large, the gradients computed by N at initialization vary wildly. Our approach complements the mean field theory analysis of random networks. From this point of view, we rigorously compute finite width corrections to the statistics of gradients at the edge of chaos.

exploding and vanishing gradient

Neural Information Processing Systems

Genre: Research Report > New Finding (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.77)

Add feedback

Which Neural Net Architectures Give Rise to Exploding and Vanishing Gradients?

Hanin, Boris

Neural Information Processing SystemsDec-31-2018

We give a rigorous analysis of the statistical behavior of gradients in a randomly initialized fully connected network N with ReLU activations. Our results show that the empirical variance of the squares of the entries in the input-output Jacobian of N is exponential in a simple architecture-dependent constant beta, given by the sum of the reciprocals of the hidden layer widths. When beta is large, the gradients computed by N at initialization vary wildly. Our approach complements the mean field theory analysis of random networks. From this point of view, we rigorously compute finite width corrections to the statistics of gradients at the edge of chaos.

artificial intelligence, evgp, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Texas (0.28)

Genre: Research Report > New Finding (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Which Neural Net Architectures Give Rise to Exploding and Vanishing Gradients?

Hanin, Boris

Neural Information Processing SystemsDec-31-2018

We give a rigorous analysis of the statistical behavior of gradients in a randomly initialized fully connected network N with ReLU activations. Our results show that the empirical variance of the squares of the entries in the input-output Jacobian of N is exponential in a simple architecture-dependent constant beta, given by the sum of the reciprocals of the hidden layer widths. When beta is large, the gradients computed by N at initialization vary wildly. Our approach complements the mean field theory analysis of random networks. From this point of view, we rigorously compute finite width corrections to the statistics of gradients at the edge of chaos.

artificial intelligence, evgp, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Texas (0.28)

Genre: Research Report > New Finding (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback